Sign Stable Random Projections for Large-Scale Learning
نویسنده
چکیده
In this paper, we study the use of “sign α-stable random projections” (where 0 < α ≤ 2) for building basic data processing tools in the context of large-scale machine learning applications (e.g., classification, regression, clustering, and near-neighbor search). After the processing by sign stable random projections, the inner products of the processed data approximate various types of nonlinear kernels depending on the value of α. Thus, this approach provides an effective strategy for approximating nonlinear learning algorithms essentially at the cost of linear learning. When α = 2, it is known that the corresponding nonlinear kernel is the arc-cosine kernel. When α = 1, the procedure approximates the arc-cos-χ kernel (under certain condition). When α → 0+, it corresponds to the resemblance kernel, which provides the exciting connection between two popular randomized algorithms: (i) stable random projections (ii) b-bit minwise hashing. No theoretical results are known so far for other α values except for α = 2, 1, or 0+.
منابع مشابه
Sign Stable Projections, Sign Cauchy Projections and Chi-Square Kernels
The method of stable random projections is popular for efficiently computing the lα distances in high dimension (where 0 < α ≤ 2), using small space. Because it adopts nonadaptive linear projections, this method is naturally suitable when the data are collected in a dynamic streaming fashion (i.e., turnstile data streams). In this paper, we propose to use only the signs of the projected data an...
متن کاملSign Cauchy Projections and Chi-Square Kernel
The method of stable random projections is useful for efficiently approximating the lα distance (0 < α ≤ 2) in high dimension and it is naturally suitable for data streams. In this paper, we propose to use only the signs of the projected data and we analyze the probability of collision (i.e., when the two signs differ). Interestingly, when α = 1 (i.e., Cauchy random projections), we show that t...
متن کاملVery Sparse Stable Random Projections, Estimators and Tail Bounds for Stable Random Projections
The method of stable random projections [39, 41] is popular for data streaming computations, data mining, and machine learning. For example, in data streaming, stable random projections offer a unified, efficient, and elegant methodology for approximating the lα norm of a single data stream, or the lα distance between a pair of streams, for any 0 < α ≤ 2. [18] and [20] applied stable random pro...
متن کاملLearning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
Learning sparse representations on data adaptive dictionaries is a state-of-the-art method for modeling data. But when the dictionary is large and the data dimension is high, it is a computationally challenging problem. We explore three aspects of the problem. First, we derive new, greatly improved screening tests that quickly identify codewords that are guaranteed to have zero weights. Second,...
متن کاملUsing Stable Random Projections
Abstract Many tasks (e.g., clustering) in machine learning only require the lα distances instead of the original data. For dimension reductions in the lα norm (0 < α ≤ 2), the method of stable random projections can efficiently compute the lα distances in massive datasets (e.g., the Web or massive data streams) in one pass of the data. The estimation task for stable random projections has been ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1504.07235 شماره
صفحات -
تاریخ انتشار 2015